Unsupervised mining of frequent tags for clinical eligibility text indexing
نویسندگان
چکیده
Clinical text, such as clinical trial eligibility criteria, is largely underused in state-of-the-art medical search engines due to difficulties of accurate parsing. This paper proposes a novel methodology to derive a semantic index for clinical eligibility documents based on a controlled vocabulary of frequent tags, which are automatically mined from the text. We applied this method to eligibility criteria on ClinicalTrials.gov and report that frequent tags (1) define an effective and efficient index of clinical trials and (2) are unlikely to grow radically when the repository increases. We proposed to apply the semantic index to filter clinical trial search results and we concluded that frequent tags reduce the result space more efficiently than an uncontrolled set of UMLS concepts. Overall, unsupervised mining of frequent tags from clinical text leads to an effective semantic index for the clinical eligibility documents and promotes their computational reuse.
منابع مشابه
eTACTS: an Eligibility Tag Cloud-based Clinical Trial Search Engine
We present eTACTS, a Web search engine that allows users to select eligibility tags to filter clinical trial search results. A controlled vocabulary of frequent tags is automatically mined by cross-processing clinical trial eligibility criteria and used to index all the trials in ClinicalTrials.gov. After an initial search, these tags are presented to the users as an interactive tag cloud for i...
متن کاملApplication of a Mining Algorithm to Finding Frequent Patterns in a Text Corpus: A Case Study of the Arabic
Information repositories containing text data of different languages are abundant on the World Wide Web. Digital corpora of sacred text of Islam related to Quran containing Arabic language are also publicly available. The availability of these corpora and intelligent application to analyze them are vital to better comprehend the religious text of Islam. In this paper I propose a method of repre...
متن کاملInteresting-Phrase Mining for Ad-Hoc Text Analytics
Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize ...
متن کاملScalable Phrase Mining for Ad-hoc Text Analytics
Large text corpora with news, customer mail and reports, or Web 2.0 contributions offer a great potential for enhancing business-intelligence applications. We propose a framework for performing text analytics on such data in a versatile, efficient, and scalable manner. While much of the prior literature has emphasized mining keywords or tags in blogs or social-tagging communities, we emphasize ...
متن کاملSingle Document Keyphrase Extraction Using Label Information
Keyphrases have found wide ranging application in NLP and IR tasks such as document summarization, indexing, labeling, clustering and classification. In this paper we pose the problem of extracting label specific keyphrases from a document which has document level metadata associated with it namely labels or tags (i.e. multi-labeled document). Unlike other, supervised or unsupervised, methods f...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Journal of biomedical informatics
دوره 46 6 شماره
صفحات -
تاریخ انتشار 2013